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Broad coverage paragraph segmentation across languages and domains 
Caroline Sporleder, Mirella Lapata 

July 2006 ACM Transactions on Speech and Language Processing (TSLP), volume 3 issue 

2 

Publisher: ACM Press 

Full text available: ^.p.dK3.QQ t .9.1.l<B) Additional Information: Ml citation, abstract, references, indextejiTis 

This article considers the problem of automatic paragraph segmentation. The task is 
relevant for speech-to-text applications whose output transcipts do not usually contain 
punctuation or paragraph indentation and are naturally difficult to read and process. Text- 
to-text generation applications (e.g., summarization) could also benefit from an automatic 
paragaraph segementation mechanism which indicates topic shifts and provides visual 
targets to the reader. We present a paragraph segmentation mod ... 



Keywords: Machine learning, paragraph breaks, segmentation, summarization 



2 



3 



A compression-^ 

W. J. Teahan, Rodger McNab, Yingying Wen, Ian H. Witten 
September 2000 Computational Linguistics, volume 26 issue 3 
Publisher: MIT Press 
Full text available: ^ j§| 

H port L34 MB} *o Additional Information: full citation , abstract , references , citings 

Publisher Site 

Chinese is written without using spaces or other word delimiters. Although a text may be 
thought of as a corresponding sequence of words, there is considerable ambiguity in the 
placement of boundaries. Interpreting a text as a sequence of words is beneficial for some 
information retrieval and storage tasks:for example, fulltext search, word-based 
compression, and keyphrase extraction. We describe a scheme that infers appropriate 
positions for word boundaries using an adaptive language model that ... 

A computational grammar of discourse-neutral pro'sodic phrasing in English 
J. Bachenko, E. Fitzpatrick 

September 1990 Computational Linguistics, Volume 16 Issue 3 

Publisher: MIT Press 

Full text available* heRI 
" ' " ^ ' Additional Information: Mlcitation, abstract, references, citings 
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We describe an experimental text-to-speech system that uses information about syntactic 
constituency, adjacency to a verb, and constituent length to determine prosodic phrasing 
for synthetic speech. A central goal of our work has been to characterize "discourse 
neutral" phrasing, i.e. sentence-level phrasing patterns that are independent of discourse 
semantics. Our account builds on Bachenko et al. (1986), but differs in its treatment of 
clausal structure and predicate-argument relations. Result ... 



The disambiguation of nominalizations 
Maria Lapata 

September 2002 Computational Linguistics, volume 28 issue 3 
Publisher: MIT Press 

Full text available- flB Rdfl471 68 KB) Additional Information: full citation , abstract , references , citings , Index 
^* * * terms 

This article addresses the interpretation of nominalizations, a particular class of compound 
nouns whose head noun is derived from a verb and whose modifier is interpreted as an 
argument of this verb. Any attempt to automatically interpret nominalizations needs to 
take into account: (a) the selectional constraints imposed by the nominalized compound 
head, (b) the fact that the relation of the modifier and the head noun can be ambiguous, 
and (c) the fact that these constraints can be easily overr ... 



Annotatjpns.and.tg 
Jens Allwood, Leif Gronqvist 

September 2001 Proceedings of the Second SIGdial Workshop on Discourse and 
Dialogue - Volume 16 

Publisher: Association for Computational Linguistics 

Full text available: ^fid£7102.KB}. Additional Information: fyjl.cltatjon, abstract, references 

The paper contains a description of the Spoken Language Corpus of Swedish at the 
Department of Linguistics, Goteborg University (GSLC), and a summary of the various 
types of analysis and tools that have been developed for work on this corpus. Work on the 
corpus was started in the late 1970:s. It is incrementally growing and presently consists of 
1.3 million words from about 25 different social activities. The corpus was initiated to meet 
a growing interest in naturalistic spoken language da ... 

Document detection: Inquery system overview 
John Broglio, James P. Callan, W. Bruce Croft 

September 1993 Proceedings of a workshop on held at Fredericksburg, Virginia: 

September 19-23, 1993 
Publisher: Association for Computational Linguistics 

Full text available: ^fidftLSLMBl Additional Information: fyjj. citation, abstract, references 

The TIPSTER project in the Information Retrieval Laboratory of the Computer Science 
Department, University of Massachusetts, Amherst (which includes MCC as a 
subcontractor), has focused on the following goals:* Improving the effectiveness of 
information retrieval techniques for large, full-text databases,* Improving the 
effectiveness of routing techniques appropriate for long-term information needs, and* 
Demonstrating the effectiveness of these retrieval and routing techniques for ... 

Hi gh-performance bilingual text alignment using statistical and dictionary information jj 
Masahiko Haruno, Takefumi Yamazaki 

June 1996 Proceedings of the 34th annual meeting on Association for Computational 
Linguistics 

Publisher: Association for Computational Linguistics 
Full text available: *^ pdf(7,57.66 KB),, 
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,., . .. . Additional Information: full citation, abstract, references, citings 
Publisher Site 

This paper describes an accurate and robust text alignment system for structurally 
different languages. Among structurally different languages such as Japanese and English, 
there is a limitation on the amount of word correspondences that can be statistically 
acquired. The proposed method makes use of two kinds of word correspondences in 
aligning bilingual texts. One is a bilingual dictionary of general use. The other is the word 
correspondences that are statistically acquired in the alignment pr ... 

NpLu>npLin.compound 
Takaaki Tanaka, Timothy Baldwin 

July 2003 Proceedings of the ACL 2003 workshop on Multiword expressions: 
analysis, acquisition and treatment - Volume 18 

Publisher: Association for Computational Linguistics 

Full text available: ^.p.dfl164.29 KB) Additional Information: full citation, abstract, refejences 

The translation of compound nouns is a major issue in machine translation due to their 
frequency of occurrence and high productivity. Various shallow methods have been 
proposed to translate compound nouns, notable amongst which are memory-based 
machine translation and word-to-word compositional machine translation. This paper 
describes the results of a feasibility study on the ability of these methods to translate 
Japanese and English noun-noun compounds. 

lechnigue.forauiomMiS 
Karen Kukich 

December 1992 ACM Computing Surveys (CSUR), volume 24 issue 4 
Publisher: ACM Press 

Full text available: ffi pdf(6 23 MB i Additional Information: fuH .citation, abstract, references, cjtings, index 

"'" * . terms , review 

Research aimed at correcting words in text has focused on three progressively more 
difficult problems:(l) nonword error detection; (2) isolated-word error correction; and (3) 
context-dependent work correction. In response to the first problem, efficient pattern- 
matching and n-gram analysis techniques have been developed for detecting strings that 
do not appear in a given word list. In response to the second problem, a variety of general 
and application-specific spelling cor ... 

Keywords: n-gram analysis, Optical Character Recognition (OCR), context-dependent 
spelling correction, grammar checking, natural-language-processing models, neural net 
classifiers, spell checking, spelling error detection, spelling error patterns, statistical- 
language models, word recognition and correction 



A state of the art of Tha i Languag e r es ou rces a n d T ha i Language behavior analysis Q 
and.mocie|jng. 

Asanee Kawtrakul, Mukda Suktarachan, Patcharee Varasai, Hutchatai Chanlekha 
August 2002 Proceedings of the 3rd workshop on Asian language resources and 
international standardization - Volume 12 COLING '02 

Publisher: Association for Computational Linguistics 

Full text available: ^Mt(2M t 82 KB) Additional Information: Ml citation, abstract, references 

As electronic communications is now increasing, the term Natural Language Processing 
should be considered in the broader aspect of Multi-Language processing system. 
Observation of the language behavior will provide a good basis for design of computational 
language model and also creating cost-effective solutions to the practical problems. In 
order to have a good language modeling, the language resources are necessary for the 
language behavior analysis.This paper intended to express what we have ... 
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1 1 Nrrgram^ 

Jose B. Marido, Rafael E. Banchs, Josep M. Crego, Adria de Gispert, Patrik Lambert, Jose A. 
R. Fonollosa, Marta R. Costa-jussa 

December 2006 Computational Linguistics, Volume 32 issue 4 
Publisher: MIT Press 

Full text available: * ^pdW29173 KB] Additional Information: full citation , abstract 

This article describes in detail an n-gram approach to statistical machine translation. This 
approach consists of a log-linear combination of a translation model based on n-grams of 
bilingual units, which are referred to as tuples, along with four specific feature functions. 
Translation performance, which happens to be in the state of the art, is demonstrated with 
Spanish-to-English and English-to-Spanish translations of the European Parliament Plenary 
Sessions (EPPS). 



2 ComiMato syntactic links H 

Leonid Mitjushin 

August 1992 Proceedings of the 14th conference on Computational linguistics - 
Volume 3 

Publisher: Association for Computational Linguistics 

Full text available: ^])^431^KB) Additional Information: full citation, references, .Citings 



S8ai35hiDgJte r ecorded co n versatio ns 

Peng Yu, Kaijiang Chen, Lie Lu, Frank Seide 

October 2005 Proceedings of the conference on Human Language Technology and 

Empirical Methods in Natural Language Processing HLT '05 
Publisher: Association for Computational Linguistics 

Full text available: f |pdf(452.53 KB) Additional Information: full citation , abstract, references 

MITs Audio Notebook added great value to the note-taking process by retaining audio 
recordings, e.g. during lectures or interviews. The key was to provide users ways to 
quickly and easily access portions of interest in a recording. Several non-speech- 
recognition based techniques were employed. In this paper we present a system to search 
directly the audio recordings by key phrases. We have identified the user requirements as 
accurate ranking of phrase matches, domain independence, and reasonabl ... 



Phrase recognition and expansion for short, precision-biased queries based on a 
au@ryJ.Q9i 

Erika F. de Lima, Jan O. Pedersen 

August 1999 Proceedings of the 22nd annual international ACM SIGIR conference on 

Research and development in information retrieval SIGIR '99 
Publisher: ACM Press 

Full text available: |§.pdf(1 89,83 KB) Additional Information: M cifatiSEl, references, citings, index terms 
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Paul S. Jacobs, George Krupka, Lisa Rau, Michael L. Mauldin, Teruko Mitamura, Tsuyoshi 
Kitani, Ira Sider, Lois Childs 

September 1993 Proceedings of a workshop on held at Fredericksburg, Virginia: 

September 19-23, 1993 
Publisher: Association for Computational Linguistics 



http://portal.acm.org/resultsx 9/12/07 



Results (page 1): + H (word combination" or sentence or phrase or paragraph or words)" +c... Page 5 of 6 



Full text available: *^|pdf(1.11 MB) Additional Information: Ml.dtatjon, abstract, refe rences 

This paper presents an overview of the TIPSTER/SHOGUN project, the major results, and 
the SHOGUN data extraction system. TIPSTER/SHOGUN was a joint effort of GE Corporate 
Research and Development, Carnegie Mellon University, and Martin Marietta Management 
and Data Systems (formerly GE Aerospace), part of the ARPA TIPSTER Text program. Two 
of the main technical thrusts of the project were: (1) the development of a model of finite- 
state approximation, in which the accuracy of more detailed models ... 



16 Using comparable corpora to solve problems difficult for human translators 
Serge Sharoff, Bogdan Babych, Anthony Hartley 

July 2006 Proceedings of the COLING/ ACL on Main conference poster sessions 
Publisher: Association for Computational Linguistics 

Full text available: | p pdf(151.23 KB) Additional Information: full citation , abstract , references 

In this paper we present a tool that uses comparable corpora to find appropriate 
translation equivalents for expressions that are considered by translators as difficult. For a 
phrase in the source language the tool identifies a range of possible expressions used in 
similar contexts in target language corpora and presents them to the translator as a list of 
suggestions. In the paper we discuss the method and present results of human evaluation 
of the performance of the tool, which highlight it ... 

17 Improving LSA-based summarization with anaphora resolution 

Josef Steinberger, Mijail A. Kabadjov, Massimo Poesio, Olivia Sanchez-Graillet 

October 2005 Proceedings of the conference on Human Language Technology and 

Empirical Methods in Natural Language Processing HLT '05 
Publisher: Association for Computational Linguistics 

Full text available: B f| pdf( 165.59 KB) Additional Information: full citation , abstract , references 

We propose an approach to summarization exploiting both lexical information and the 
output of an automatic anaphoric resolver, and using Singular Value Decomposition (SVD) 
to identify the main terms. We demonstrate that adding anaphoric information results in 
significant performance improvements over a previously developed system, in which only 
lexical terms are used as the input to SVD. However, we also show that how anaphoric 
information is used is crucial: whereas using this information to ad ... 

1 8 Text Adaptati on for Mobile Djgjtel Tejetorf 
Chengyuan Peng, Petri Vuorimaa 

September 2004 Proceedings of the 2004 IEEE/WIC/ACM International Conference on 
Web Intelligence WI '04 

Publisher: IEEE Computer Society 

Full text available: ffipdfM11.90 KB) 

jgjf Additional Information: full citation, abstract 

Small and varying screen sizes of mobile devices pose a big problem for digital Teletext 
service to display its content. It is difficult to display all the text information on a small 
screen, where page scroll or transparent page to live video is not practical. This paper 
presents an adaptive text extraction method which can automatically extract key 
information from original text and keep semantic meanings as close as possible. We 
combine both statistical methods and coarse coding algorithm fro ... 

19 -CombiningJ 

Surapant Meknavin, Boonserm Kijsirikui, Ananlada ChotimongkoT, Cholwich Nuttee 
August 1998 Proceedings of the 36th annual meeting on Association for 

Computational Linguistics - Volume 2 , Proceedings of the 17th 
international conference on Computational linguistics - Volume 2 
Publisher: Association for Computational Linguistics 
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For languages that have no explicit word boundary such as Thai, Chinese and Japanese, 
correcting words in text is harder than in English because of additional ambiguities in 
locating error words. The traditional method handles this by hypothesizing that every 
substrings in the input sentence could be error words and trying to correct all of them. In 
this paper, we propose the idea of reducing the scope of spelling correction by focusing 
only on dubious areas in the input sentence. Boundaries of ... 

20 Text cjass^ 

jigk: Dou Shen, Zheng Chen, Qiang Yang, Hua-Jun Zeng, Benyu Zhang, Yuchang Lu, Wei-Ying Ma 
^ July 2004 Proceedings of the 27th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '04 
Publisher: ACM Press 

Full text available- f ^pdf(22ft 80 KB) Additional Information: full citation, abstract, references, citings, index 
' ^ s terms , review 

Web-page classification is much more difficult than pure-text classification due to a large 
variety of noisy information embedded in Web pages. In this paper, we propose a new 
Web-page classification algorithm based on Web summarization for improving the 
accuracy. We first give empirical evidence that ideal Web-page summaries generated by 
human editors can indeed improve the performance of Web-page classification algorithms. 
We then propose a new Web summarization-based classification algorithm ... 

Keywords: content body, web page categorization, web page summarization 
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An empirical mode! of multiword expression decomposabijlty 

Timothy Baldwin, Colin Bannard, Takaaki Tanaka, Dominic Widdows 

July 2003 Proceedings of the ACL 2003 workshop on Multiword expressions: 

analysis, acquisition and treatment - Volume 18 
Publisher: Association for Computational Linguistics 

Full text available: ^ pdf( 142.89 KB) Additional Information: full citation , abstract , references, citings 

This paper presents a construction-inspecific model of multiword expression 
decomposability based on latent semantic analysis. We use latent semantic analysis to 
determine the similarity between a multiword expression and its constituent words, and 
claim that higher similarities indicate greater decomposability. We test the model over 
English noun-noun compounds and verb-particles, and evaluate its correlation with 
similarities and hyponymy values in WordNet. Based on mean hyponymy over partitio ... 

jBemanl^^ pre d i cat es in Chinese 

Nianwen Xue 

June 2006 Proceedings of the main conference on Human Language Technology 
Conference of the North American Chapter of the Association of 
Computational Linguistics 

Publisher: Association for Computational Linguistics 

Full text available: *f|) pdfM 05.35 KB) Additional Information: full citation, abstract , references 

Recent work on semantic role labeling (SRL) has focused almost exclusively on the 
analysis of the predicate-argument structure of verbs, largely due to the lack of human- 
annotated resources for other types of predicates that can serve as training and test data 
for the semantic role labeling systems. However, it is well-known that verbs are not the 
only type of predicates that can take arguments. Most notably, nouns that are nominalized 
forms of verbs and relational nouns generally are also consi ... 



23 tRuEcasjng Q 
Lucian Vlad Lita, Abe Ittycheriah, Salim Roukos, Nanda Kambhatla ^ 
July 2003 Proceedings of the 41st Annual Meeting on Association for Computational 
Linguistics - Volume 1 ACL '03 

Publisher: Association for Computational Linguistics 

Full text available: gsdfM73.84 KB) Additional Information: full citation , abstract , references , citings 

Truecasing is the process of restoring case information to badly-cased or non-cased text. 
This paper explores truecasing issues and proposes a statistical, language modeling based 
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truecaser which achieves an accuracy of ~98% on news articles. Task based evaluation 
shows a 26% F-measure improvement in named entity recognition when using truecasing. 
In the context of automatic content extraction, mention detection on automatic speech 
recognition text is also improved by a factor of 8. Truec ... 

24 A metho(dfor.iO 

Atsushi Fujii, Katunobu Itou, Tetsuya Ishikawa 

July 2002 Proceedings of the ACL-02 conference on Empirical methods in natural 
language processing - Volume 10 EMNLP '02 

Publisher: Association for Computational Linguistics 

Full text available: ^.p.dff.1.Q7.J4 KB) Additional Information: M cjtatjon, abstract, references, citings 

While recent retrieval techniques do not limit the number of index terms, out-of- 
vocabulary (OOV) words are crucial in speech recognition. Aiming at retrieving information 
with spoken queries, we fill the gap between speech recognition and text retrieval in terms 
of the vocabulary size. Given a spoken query, we generate a transcription and detect OOV 
words through speech recognition. We then correspond detected OOV words to terms 
indexed in a target collection to complete the transcription, and ... 

25 Summarization: Web-page summarization using clickthrough data 

^ Jian-Tao Sun, Dou Shen, Hua-Jun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen 
^ August 2005 Proceedings of the 28th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '05 
Publisher: ACM Press 

Full text available- ■ffipdfMS? 70 KB » Additional Information: full .citation, abstract, references, citings, index 
' terms 

Most previous Web-page summarization methods treat a Web page as plain text. However, 
such methods fail to uncover the full knowledge associated with a Web page needed in 
building a high-quality summary, because many of these methods do not consider the 
hidden relationships in the Web. Uncovering the hidden knowledge is important in building 
good Web-page summarizers. In this paper, we extract the extra knowledge from the 
clickthrough data of a Web search engine to improve Web-page summarization ... 

Keywords: clickthrough data, generic web-page summarization, latent semantic analysis, 
thematic lexicon 



CoNLL-2000 papers: increasing our ignorance of language: identifying language 
structure in an unknown *slgnaP 
John Elliott, Eric Atwell, Bill Whyte 

September 2000 Proceedings of the 2nd workshop on Learning language in logic and 
the 4th conference on Computational natural language learning - 
Volume 7 

Publisher: Association for Computational Linguistics 

Full text available: ^fi.dfC5M.4Q.KBj Additional Information: M.Gfe.tiffiCL abstract, references 

This paper describes algorithms and software developed to characterise and detect generic 
intelligent language-like features in an input signal, using natural language learning 
techniques: looking for characteristic statistical "language-signatures" in test corpora. As a 
first step towards such species-independent language-detection, we present a suite of 
programs to analyse digital representations of a range of data, and use the results to 
extrapolate whether or not there are language-like stru ... 

NEXUS: a linguistic technique for procoordination 
R. A. Benson 

September 1969 Proceedings of the 1969 conference on Computational linguistics 
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Publisher: Association for Computational Linguistics 

Full text available: MB) Additional Information: full citation , abstract, references 

A method for automatically precoordinating index terms was devised to form combinations 
of terms which are stored as subject headings. A computer program accepts lists of auto- 
indexed terms and by applying linguistic and sequence rules combines appropriate terms, 
thereby effecting improved searchability of an information storage and retrieval system. A 
serious failing exists in many indexing systems in that index terms authorized for use are 
too general for use by technically-knowledgeable search ... 



Fast, and .guasl-nMuraj language search for gigabytes of Chinese texts H 
I Lee-Feng Chien 

July 1995 Proceedings of the 18th annual international ACM SIGIR conference on 
Research and development in information retrieval SIGIR '95 

Publisher: ACM Press 

Full text available: ||Mf(82Q,58 KB) Additional Information: full citation, references, citings, index terms 
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EQStgtJSesaffi a robust thai mo r phological 

analyzer 

Kawtrakul Asanee, Thumkanon Chalatip, Jamjanya Thitima, Muangyunnan Parinee, Poolwan 
Kritsada, Inagaki Yasuyoshi 

August 1996 Proceedings of the 16th conference on Computational linguistics - 
Volume 2 

Publisher: Association for Computational Linguistics 

Full text available: .pdf(360,47 KB) Additional Information: full. .citation, abstract, references 

This work attempts to provide a robust Thai morphological analyzer which can 
automatically assign the correct part-of-speech tag to the correct word with time and 
space efficiency. Instead of using a corpus based approach with requires a large amount of 
training data and validation data, a new simple hybrid technique which incorporates 
heuristic, syntactic and semantic knowledge is proposed. To implement this technique, a 
three-stage approach is adopted to the gradual refinement module. It consi ... 



S&gciaUs^ methods for text and images: Wo r d ^ 

kernels 

Nicola Cancedda, Eric Gaussier, Cyril Goutte, Jean Michel Renders 
March 2003 The Journal of Machine Learning Research, volume 3 
Publisher: MIT Press 

Full text available: l f|| pdf(20B.06 KB) Additional Information: full citation , abstract, citings, index terms 

We address the problem of categorising documents using kernel-based methods such as 
Support Vector Machines. Since the work of Joachims (1998), there is ample experimental 
evidence that SVM using the standard word frequencies as features yield state-of-the-art 
performance on a number of benchmark problems. Recently, Lodhi et al. (2002) proposed 
the use of string kernels, a novel way of computing document similarity based of matching 
non-consecutive subsequences of characters. In this arti ... 
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Lisa F. Rau, Paul S. Jacobs 
September 1991 Proceedings of the 14th annual international ACM SIGIR conference 



on Research and development in information retrieval SIGIR '91 
Publisher: ACM Press 

Full text available: fl | s>df(9Q4.82 KB) Additional Information: full citation, references, citings , index terms 



Statistical language modeling: Hypothesizing word association from gntaggM text Q 
Tomoyoshi Matsukawa 

March 1993 Proceedings of the workshop on Human Language Technology HLT '93 
Publisher: Association for Computational Linguistics 

Full text available: * g|pdff498.43 KB) Additional Information: full citation , abstract, references, citings 

This paper reports a new method for suggesting word associations, based on a greedy 
algorithm that employs Chi-square statistics on joint frequencies of pairs of word groups 
compared against chance co-occurrence. The benefits of this new approach are: 1) we can 
consider even low frequency words and word pairs, and 2) word groups and word 
associations can be automatically generated. The method provided 87% accuracy in 
hypothesizing word associations for unobserved combinations of words in Japanes ... 



Smaniics^ 

Ralph Grishman, John Sterling 

August 1994 Proceedings of the 15th conference on Computational linguistics 
Volume 2 

Publisher: Association for Computational Linguistics 

Full text available: ^p„dfC46J.J8.KBj Additional Information: fall citation, abstract, references, citings 

Frequency information on co-occurrence patterns can be automatically collected from a 
syntactically analyzed corpus; this information can then serve as the basis for selectional 
constraints when analyzing new text from the same domain. This information, however, is 
necessarily incomplete. We report on measurements of the degree of selectional coverage 
obtained with different sizes of corpora. We then describe a technique for using the corpus 
to identify selectionally similar terms, and for using ... 

Abb reviation Expansion in Schema Matching and Web integration Q 
L. Ratinov, E. Gudes 

September 2004 Proceedings of the 2004 IEEE/ WIC/ ACM International Conference on 
Web Intelligence WI '04 

Publisher: IEEE Computer Society 

Full text available: Wfidfi.l.12J4:tSJ3j 

j|| Additional Information: full citation , abstract 

Schema matching is a problem of finding correspondences, particularly equivalence 
relationships across schemas. The problem has a particular significance in integrating web 
repositories, as distributed databases over the web becomes increasingly popular. Most of 
the existing prototypes use schema level lexical information for schema matching. 
However, most of them perform rather poorly on real-world problems due to the 
abundance of abbreviations in real-world schemas. For example, none of the le ... 

Co-ordinative ellipsis in Russian texts: problems of description and restoration 
Igor A. Bolshakov 

August 1988 Proceedings of the 12th conference on Computational linguistics 
Volume 1 
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Publisher: Association for Computational Linguistics 

Full text available: ' H sdf(477.65 KB) Additional Information: full citation , abstract , references 

Russian elliptic constructions are examined from the point of view of syntactic analysis. 
Reciprocal elements in a co-ordinative elliptic sentence are exposed and possible types of 
their similarity are explored. Linear formulae of ellipsis for most textual cases are 
constructed and statistics of their use is discussed. As a result the main steps of ellipsis 
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37 AutomM^ 

Kevin Duh, Katrin Kirchhoff 

August 2004 Proceedings of the 20th international conference on Computational 

Linguistics COLING '04 
Publisher: Association for Computational Linguistics 

Full text available: | | pdff 167.26 KB? Additional Information: full citation, abstract, references 

Statistical language modeling remains a challenging task, in particular for morphologically 
rich languages. Recently, new approaches based on factored language models have been 
developed to address this problem. These models provide principled ways of including 
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Spontaneous human utterances in the context of human-human and human-machine 
dialogs are rampant with dysfluencies, and speech repairs. Furthermore, when recognized 
using a speech recognizer, these utterances produce a sequence of words with no 
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We propose an algorithm to automatically induce the morphology of inflectional languages 
using only text corpora and no human input. Our algorithm combines cues from 
orthography, semantics, and syntactic distributions to induce morphological relationships 
in German, Dutch, and English. Using CELEX as a gold standard for evaluation, we show 
our algorithm to be an improvement over any knowledge-free algorithm yet proposed. 
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